Journal of Virology JVI Journal of Virology Accepted ManUSCrTpFPoSted Onlme 


JVI Accepted Manuscript Posted Online 12 August 2015 
J. Virol, doi:10.1128/JVI.01048-15 

Copyright © 2015, American Society for Microbiology. All Rights Reserved. 


1 SARS coronavirus ORF8 protein is acquired from SARS-related coronavirus 

2 from greater horseshoe bats through recombination 

3 

4 Susanna K. P. Lau. 1Lb,cd: Yun Feng, e ’ f ' Honglin ( lien, a ' bx d : Hayes K. H. Luk, d Wei-Hong 

5 Yang, e ' f Kenneth S. M. Li, d Yu-Zhen Zhang, e,f Yi Huang, d Zhi-Zhong Song, e,f Wang-Ngai 

6 Chow, d Rachel Y. Y. Fan, d Syed Shakeel Ahmed, d Hazel C. Yeung, d Carol S. F. Lam, d 

7 Jian-Piao Cai, d Samson S. Y. Wong, a,b,c,d Jasper F. W. Chan, a ’ b,c,d Kwok-Yung Yuen, a b,c ’ d 

8 Hai-Lin Zhang, e ' p Patrick C. Y. Woo a ’ b ’ c d * 

9 

10 State Key Laboratory of Emerging Infectious Diseases, 3 Research Centre of Infection and 

11 Immunology, 15 Carol Yu Centre for Infection, 0 Department of Microbiology, d The 

12 University of Hong Kong, Hong Kong, China; Yunnan Institute of Endemic Diseases 

13 Control and Prevention, e Yunnan Provincial Key Laboratory for Zoonosis Control and 

14 Prevention, 1 Dali, Yunnan, China 

15 

16 Running title: Origin of SARS coronavirus ORF8 

17 

18 ' SKP Lau, Y Feng and H Chen contributed equally to the manuscript. 

19 

20 *Corresponding authors. Mailing address: Patrick CY Woo, State Key Laboratory 

21 of Emerging Infectious Diseases, Department of Microbiology, The University of Hong 

22 Kong, Room 423, University Pathology Building, Queen Mary Hospital, Hong Kong, 

23 China. E-mail: pcvwoo@hku.hk ; Hai-Lin Zhang, Yunnan Institute of Endemic Diseases 

1 



Journal of Virology JVI Journal of Virology Accepted ManUSCTipFPoSted Onlme 


24 Control and Prevention, Dali, Yunnan 671000, PR China. 

25 zhangHL715@163.com 

26 

27 Abstract: 248 words 

28 Text: 5315 words 

29 


E-mail: 



cfWology JVI Journal of Virology Accepted MdnU SC H pf Posted OfTlme 


30 ABSTRACT 

31 Despite the identification of horseshoe bats as the reservoir of SARS-related- 

32 coronaviruses (SARSr-CoVs), the origin of SARS-CoV ORF8, which contains the 29-nt 

33 signature deletion among human strains, remains obscure. Although two SARSr-Rs- 

34 BatCoVs, RsSHC014 and Rs3367, previously detected from Chinese horseshoe bats 

35 ( Rhinoloplius sinicus) in Yunnan, possessed 95% genome identities to human/civet 

36 SARSr-CoVs, their ORF8 exhibited only 32.2-33% aa identities to that of human/civet 

37 SARSr-CoVs. To elucidate the origin of SARS-CoV ORF8, we sampled 348 bats of 

38 various species in Yunnan, among which diverse alphacoronciviruses and 

39 betacoronaviruses, including potentially novel CoVs, were identified, with some showing 

40 potential interspecies transmission. The genomes of two betacoronaviruses, SARSr-Rf- 

41 BatCoV YNLF 31C and YNLF34C, from greater horseshoe bats ( Rhinoloplius 

42 ferrumequinum), possessed 93% nt identities to human/civet SARSr-CoV genomes. 

43 Although they displayed lower similarities to civet SARSr-CoVs than SARSr-Rs- 

44 BatCoV RsSFlC014 and Rs3367 in S protein, their ORF8 demonstrated exceptionally 

45 high (80.4-81.3%) aa identities to that of human/civet SARSr-CoVs, compared to 

46 SARSr-BatCoVs from other horseshoe bats (23.2-37.3%). Potential recombination events 

47 were identified around ORF8 between SARSr-Rf-BatCoVs and SARSr-Rs-BatCoVs, 

48 leading to the generation of civet SARSr-CoVs. The expression of ORF8 subgenomic 

49 mRNA suggested that this protein may be functional in SARSr-Rf-BatCoVs. The high 

50 Ka/Ks ratio among human SARS-CoVs compared to SARSr-BatCoVs supported that 

51 ORF8 is under strong positive selection during animal-to-human transmission. Molecular 

52 clock analysis using ORFlab showed that SARSs-Rf-BatCoV YNLF 31C and 
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53 YNLF 34C diverged from civet/human SARSr-CoVs at approximately 1990. SARS- 

54 CoV ORF8 is originated from SARSr-CoVs of greater horseshoe bats through 

55 recombination, which may be important for animal-to-human transmission. 
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57 IMPORTANCE 

58 Although horseshoe bats are the primary reservoir of SARS-related-coronaviruses 

59 (SARSr-CoVs), it is still unclear how these bat viruses have evolved to cross the species 

60 barrier to infect civet/human. Most human SARS-CoV epidemic strains contained a 

61 signature 29-nt deletion in ORF8 compared to civet SARSr-CoVs, suggesting that ORF8 

62 may be important for interspecies transmission. However, the origin of SARS-CoV ORF8 

63 remains obscure. In particular, SARSr-Rs-BatCoVs from Chinese horseshoe bats 

64 exhibited <40% aa identities to human/civet SARS-CoV in ORF8. We detected diverse 

65 alphcicoronciviruses and betacoronaviruses among various bat species in Yunnan, 

66 including two SARSr-Rf-BatCoVs from greater horseshoe bats that possessed an ORF8 

67 with exceptionally high aa identities to that of human/civet SARSr-CoVs. We 

68 demonstrated recombination events around ORF8 between SARSr-Rf-BatCoVs and 

69 SARSr-Rs-BatCoVs, leading to the generation of civet SARSr-CoVs. Our findings offer 

70 insight into the evolutionary origin of SARS-CoV ORF8 which was likely acquired from 

71 SARSr-CoVs of greater horseshoe bats through recombination. 

72 
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73 INTRODUCTION 

74 Coronaviruses (CoVs) are known to cause respiratory, enteric, hepatic and neurological 

75 diseases of varying severity in a variety of animals. They are currently classified into four 

76 genera, Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus, 

77 replacing the traditional three groups, group 1 to 3 (1-4). The genus Betacoronavirus is 

78 further classified into lineages A to D (3, 5, 6). Among CoVs that infect humans, human 

79 CoV 229E (HCoV 229E) and human CoV NL63 (HCoV NL63) belong to 

80 Alphacoronavirus ; human CoV OC43 (HCoV OC43) and human CoV HKU1 (HCoV 

81 HKU1) belong to Betacoronavirus lineage A; Severe Acute Respiratory Syndrome- 

82 related CoV (SARSr-CoV) belongs to Betacoronavirus lineage B; and the recently 

83 emerged Middle East Respiratory Syndrome CoV (MERS-CoV) belongs to 

84 Betacoronavirus lineage C (7-16). The high recombination rate, coupled with the 

85 infidelity of the RNA-dependent RNA polymerase (RdRp), may have facilitated CoVs to 

86 adapt to new hosts and ecological niches, causing epidemics in animals and humans (5, 

87 17-24). 

88 The SARS epidemic and identification of SARSr-CoVs from palm civet and 

89 horseshoe bats in China have boosted interests in the discovery of novel CoVs in both 

90 humans and animals especially bats (25-29). With the exception of lineage A 

91 betacoronaviruses, bats are now known to be an important reservoir of diverse 

92 alphacoronciviruses and lineage B, C and D betacoronaviruses (30-38), with bat CoVs 

93 being the gene source for other mammalian CoVs (4). In particular, the findings of bat 

94 CoVs related to SARS-CoV and MERS-CoV suggested that bats may be the animal 

95 origin of both SARS and MERS epidemics; while other animals have served as the 
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96 intermediate or amplifying hosts for animal-to-human transmission, palm civets in the 

97 case of SARS and dromedary camels in MERS (25, 27, 28, 39-41). However, the 

98 evolutionary paths from bat CoVs to CoVs capable of infecting intermediate hosts and 

99 humans are not fully understood. 

100 SARSr-CoVs have been detected in at least 11 different species of horseshoe bats 

101 (genus Rliinolophus) from various countries in Asia, Africa and Europe (27, 28, 35, 37, 

102 38, 42, 43). Related viruses have also been reported in bats of other genera, such as 

103 Chaerophon and Hipposideros, from Africa and China (43-45). However, it is still 

104 unclear how these bat CoVs have evolved to generate the ancestor of civet/human 

105 SARSr-CoVs capable of crossing the species barrier. The genome organization of 

106 SARSr-CoVs, similar to other CoVs, possessed the characteristic gene order 5’-open 

107 reading frame lab (ORFlab), spike (S), ORF3, envelope (E), membrane (M), ORF 6 to 8, 

108 nucleocapsid (N)-3’. It is known that most human SARS-CoVs during the epidemic 

109 contained a signature 29-nt deletion in ORF8 compared to civet SARSr-CoVs (25), 

110 suggesting that this genomic region may be important for interspecies transmission. 

111 However, the origin of SARS-CoV ORF8 remains obscure. Genomes of SARS-related 

112 Rliinolophus sinicus BatCoVs (SARSr-Rs-BatCoVs), previously designated SARSr-Rh- 

113 BatCoVs, from Chinese horseshoe bats ( Rliinolophus sinicus ) in Hong Kong and the 

114 Guangdong Province only shared 87-92% nucleotide (nt) identities to human/civet 

115 SARSr-CoV genomes (22, 27, 28). A subsequent study identified two SARSr-Rs- 

116 BatCoVs, RsSHC014 and Rs3367, in the Yunnan Province, which were more closely 

117 related to human/civet SARSr-CoVs (with 95% genome sequence identities) than any 

118 other SARSr-BatCoVs (42). The S proteins of these two SARSr-Rs-BatCoVs from 
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119 Yunnan shared 90.1-92.3% amino acid (aa) identities to those of human/civet SARSr- 

120 CoVs, compared to 79-80% aa identities between SARSr-Rs-BatCoVs from Hong Kong 

121 and human/civet SARSr-CoVs (27, 42). Moreover, a highly similar virus, SARSr-Rs- 

122 BatCoV WIV1, isolated in Vero E6 cells, was able to use angiotensin converting enzyme 

123 II (ACE2) from humans, civets, and Chinese horseshoe bats as receptor for cell entry, 

124 suggesting that intermediate hosts between bats and human/civets may not be necessary 

125 for interspecies transmission (42). However, considerable genetic distance still exists 

126 between the two SARSr-Rs-BatCoVs from Yunnan and human/civet SARSr-CoVs, 

127 especially in the ORF8 region with only 32.2-33% aa identities. 

128 To elucidate the evolutionary origin of SARS-CoV ORF8 and search for even 

129 closer bat CoV ancestors of SARS-CoV, we conducted a three-month study (May to July 

130 2013) on CoVs among various bats from different regions of the Yunnan Province. 

131 Diverse CoVs were detected, including two SARS-related Rhinolophus ferrumequinum 

132 BatCoVs (SARSr-Rf-BatCoVs) from greater horseshoe bats ( Rhinolophus 

133 ferrumequinum ), which possessed an expressed ORF8 much more closely related to 

134 human/civet SARSr-CoVs than CoVs detected from other bat species. Recombination 

135 and molecular clock analysis were also performed to elucidate the evolutionary paths and 

136 time of interspecies transmission of SARSr-CoVs. 
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137 MATERIALS AND METHODS 

138 Ethics statement. The collection of bat samples was approved and performed by the 

139 Yunnan Institute of Endemic Diseases Control and Prevention, Dali, Yunnan, China. All 

140 bats were maintained and handled using standard procedures approved by the Medical 

141 Ethical Committee of Yunnan Institute of Endemic Diseases Control and Prevention, 

142 China. 

143 Sample collection. Bats were captured from various locations in five counties of 

144 four prefectures of the Yunnan Province, China from May to July 2013 (Fig. 1). Samples 

145 were collected using procedures described previously (27, 46). All samples were placed 

146 in viral transport medium (Earle’s balanced salt solution, 0.09% glucose, 0.03% sodium 

147 bicarbonate, 0.45% bovine serum albumin, 50 mg/ml amikacin, 50 mg/ml vancomycin, 

148 40 U/ml nystatin) and stored at -80°C before RNA extraction. 

149 RNA extraction. Viral RNA was extracted from alimentary samples using 

150 QIAamp Viral RNA Mini Kit (QIAgen, Hilden, Germany). The RNA was eluted in 50 pi 

151 of AVE buffer and was used as the template for RT-PCR. 

152 RT-PCR for CoVs and DNA sequencing. CoVs screening was performed by 

153 amplifying a 440-bp fragment of the RdRp gene of CoVs using conserved primers (5’- 

154 GGTTGGGACTATCCTAAGTGTGA-3’ and 5’- 

155 ACCATCATCNGANARDATCATNA-3’) targeted to RdRp genes of CoVs (12). 

156 Reverse transcription was performed using the Superscript III kit (Invitrogen, Life 

157 Technologies, Grand Island, NY, USA). The PCR mixture (25 pi) contained cDNA, PCR 

158 buffer (10 mM Tris-HCl pH 8.3, 50 mM KC1, 3 mM MgCE and 0.01% gelatin), 200 pM 

159 of each dNTPs and 1.0 U Taq polymerase (Applied Biosystems, Life Technologies, 
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160 Grand Island, NY, USA). The mixtures were amplified in 40 cycles of 94°C for 1 min, 

161 48°C for 1 min and 72°C for 1 min and a final extension at 72°C for 10 min in an 

162 automated thermal cycler (Applied Biosystems). Standard precautions were taken to 

163 avoid PCR contamination and no false-positive was observed in negative controls. 

164 The PCR products were gel-purified using the QIAquick gel extraction kit 

165 (QIAgen). Both strands of the PCR products were sequenced twice with an ABI Prism 

166 3700 DNA Analyzer (Applied Biosystems), using the two PCR primers. The sequences 

167 of the PCR products were compared with known sequences of the RdRp genes of CoVs 

168 in the GenBank database. Phylogenetic tree was constructed using the 266-bp fragments 

169 of the RdRp gene with maximum likelihood method using substitution model of General 

170 Time Reversible model with Gamma Distribution as well as allowance of evolutionarily 

171 invariable sites (GTR+G+I) by MEGA 5.0 (47). 

172 Viral culture. The two samples positive for SARSr-Rf-BatCoVs were subject to 

173 virus isolation in Vero E6 (African green monkey kidney) and primary R. sinicus lung 

174 cells as described previously (21). 

175 Complete genome sequencing and analysis of SARSr-Rf-BatCoVs. Two 

176 complete genomes of SARSr-Rf-BatCoVs were amplified and sequenced using RNA 

177 extracted from the alimentary samples as templates. RNA was converted to cDNA by a 

178 combined random-priming and oligo(dT) priming strategy. The cDNA was amplified by 

179 degenerate primers as described previously (27). A total of 75 sets of primers, available 

180 on request, were used for PCR. The 5’ end of the viral genome was confirmed by rapid 

181 amplification of cDNA ends using the 573' SMARTer™ RACE cDNA Amplification Kit 

182 (Clontech, USA). Sequences were assembled and manually edited to produce the final 
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183 sequences. The nt sequences of the genomes and the deduced aa sequences of the ORFs 

184 were compared to those of other CoVs using the coronavirus database CoVDB (48). 

185 Phylogenetic tree construction was performed using maximum likelihood method with 

186 MEGA 6.0. 

187 Recombination analysis. To detect possible recombination between different 

188 SARSr-BatCoVs and civet SARSr-CoVs, sliding window analysis was performed using 

189 nt alignment of the available genome sequences generated by ClustalX version 1.83 and 

190 edited manually with BioEdit version 7.1.3. Similarity Plot analysis and Bootscan 

191 analysis were performed using Simplot version 3.5.1 (49) (F84 model; window size, 1000 

192 bp; step, 200 bp) with civet SARSr-CoV SZ3 as query. 

193 Estimation of synonymous and non-synonymous substitution rates. The 

194 number of synonymous substitutions per synonymous site, Ks, and the number of non- 

195 synonymous substitutions per non-synonymous site, Ka, for each coding region were 

196 calculated for all available SARSr-Rf-BatCoV, SARSr-Rs-BatCoV, civet SARSr-CoV 

197 and human SARSr-CoV genomes using the Nei-Gojobori method (Jukes-Cantor) in 

198 MEGA 5.0. 

199 Estimation of divergence dates. The tMRCA was estimated based on an 

200 alignment of ORFlab and nsp5 sequences, using the Uncorrelated exponential distributed 

201 relaxed clock model (UCED) in BEAST version 1.8 (http://evolve.zoo.ox.ac.uk/beast/) 

202 (50). Under this model, the rates were allowed to vary at each branch drawn 

203 independently from an exponential distribution. The sampling dates of all strains were 

204 collected from the literature or from the present study, and were used as calibration points. 

205 Depending on the data set, Markov chain Monte Carlo (MCMC) sample chains were run 
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206 for 2 x 10 s states, sampling every 1,000 generations under the GTR nt substitution model, 

207 determined by MODELTEST and allowing y-rate heterogeneity for all data sets. A 

208 constant population coalescent prior was assumed for all data sets. The median and HPD 

209 were calculated for each of these parameters from two identical but independent MCMC 

210 chains using TRACER 1.3 (http://beast.bio.ed.ac.uk). The tree was annotated by 

211 Tree Annotator, a program of BEAST and displayed by FigTree 

212 ( http://tree.bio.ed.ac.uk/software/figtree/ ). 

213 Expression of ORF8 and determination of leader-body junction sequence. 

214 The leader-body junction site and flanking sequences of the ORF8 subgenomic mRNA in 

215 SARSr-Rf-BatCoV YNLF 31C were determined using RT-PCR as described previously 

216 (21, 51). Briefly, RNA was extracted directly from the bat samples using TRIzol Reagent 

217 (Invitrogen). Reverse transcription was performed using random hexamers and the 

218 Superscript III kit (Invitrogen). cDNA was PCR amplified with a forward primer (5’- 

219 CTACCCAGGAAAAGCCAAC-3’) located in the leader sequence and a reverse primer 

220 (5’-TGAACCATAGTGTGCCATCT-3’) located in the body of the ORF8 mRNA. The 

221 PCR mixture (25 pi) contained cDNA, PCR buffer (10 mM Tris-HCl pH 8.3, 50 mM KC1, 

222 2 mM MgCE and 0.01% gelatin), 200 pM of each dNTPs and 1.0 U Taq polymerase 

223 (Applied Biosystems). The mixtures were amplified in 60 cycles of 94°C for 1 min, 50°C 

224 for 1 min and 72°C for 1 min and a final extension at 72°C for 10 min in an automated 

225 thermal cycler (Applied Biosystems). RT-PCR products were subject to agarose gel 

226 electrophoresis gel-purified using QIAquick gel extraction kit (QIAgen) and sequenced to 

227 obtain the leader-body junction sequences for the ORF8 subgenomic mRNA. 
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228 Nucleotide sequence accession numbers. The nt and genome sequences of the 

229 Co Vs detected in this study have been lodged within the GenBank sequence database 

230 under accession no. KP886808, KP886809, and KP895482 to KP895525. 
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231 RESULTS 

232 Detection of CoVs in bats. A total of 348 alimentary samples from bats belonging to 

233 five different genera were obtained from various regions of the Yunnan province. RT- 

234 PCR for a 440-bp fragment of the RdRp gene of CoVs was positive in alimentary 

235 samples from 46 bats of five species belonging to four genera (Table 1, Fig. 1). Sequence 

236 analysis showed that 35 samples contained diverse alphacoronaviruses, while 11 samples 

237 contained betacoronaviruses, including two lineage B betacoronaviruses and nine 

238 lineage D betacoronaviruses. 

239 Detection of diverse bat alphacoronaviruses. Phylogenetic analysis of the 440- 

240 bp fragments of the RdRp gene of alphacoronaviruses detected in 35 bat samples showed 

241 that two sequences from one Rhinolphus stheno and one Myotis daubentonii captured in 

242 Mojiang possessed 92-93% nt identities to Rhinolophus bat CoV HKU2 (Rh-BatCoV 

243 HKU2) (GenBank accession no. NC 009988.1) (Table 1, Fig. 2). Four sequences fromM. 

244 daubentonii in Xiangyun possessed 81% nt identity to Rh-BatCoV FIKU2 (GenBank 

245 accession no. NC_009988.1). Twenty-four sequences from M. daubentonii in Xiangyun 

246 possessed 78-99% nt identities to Myotis bat CoV FIKU6 (My-BatCoV FIKU6) (GenBank 

247 accession no. DQ249224.1). Two sequences from M. daubentonii in Mojiang possessed 

248 96% nt identities to Miniopterus bat CoV FIKU7 (Mi-BatCoV FIKU7) (GenBank 

249 accession no. DQ249226.1). One sequence from M. daubentonii in Mojiang possessed 

250 96% nt identities to Miniopterus bat CoVHKU8 (GenBank accession no. NC_010438.1). 

251 Two sequences from Hipposideros Pomona in Mojiang possessed 81-87% nt identities to 

252 Hipposideros bat CoV FIKU10 (Fli-BatCoV FIKU10) (GenBank accession no. 

253 JQ989267.1). 
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254 Detection of lineage B and D bat betacoronaviruses. Phylogenetic analysis of 

255 the 440-bp fragments of the RdRp gene of betacoronaviruses detected in two bat samples, 

256 YNLF31C and YNLF34C, showed that they belonged to Betacoronavirus lineage B, 

257 with 100% nt identities to human SARS-CoV TOR2 (GenBank accession no. 

258 AY274119.3) and 90% nt identities to SARSr-Rs-BatCoV FIKU3 (GenBank accession no. 

259 DQ022305), thus representing SARSr-Rf-BatCoVs (Table 1, Fig. 2). Both samples were 

260 collected from greater horseshoe bats ( Rhinoloplius ferrumequinum ) captured in Lufeng 

261 County, Chuxiong Yi Autonomous Prefecture (Fig. 1). Phylogenetic analysis of the 440- 

262 bp fragments of the RdRp gene of betacoronaviruses detected in nine other bat samples 

263 showed that they belonged to Betacoronavirus lineage D, with 75-79% nt identities to 

264 Rousettus bat coronavirus HKU9 (Ro-BatCoV FIKU9) (GenBank accession no. 

265 NC 009021.1). These nine samples were collected from Leschenault’s rousettes 

266 ( Rousettus leschenaulti) in Mengla County, Xishuangbanna Dai Autonomous Prefecture. 

267 Attempts to passage SARSr-Rs-BatCoV YNLF 31C and YNLF 34C in various cell lines 

268 were not successful, with no cytopathic effect or viral replication being detected. 

269 Genome comparison between SARSr-Rf-BatCoV and other SARSr-CoVs. 

270 The complete genome sequences of the two SARSr-Rf-BatCoV strains, YNLF 31C and 

271 YNLF 34C, were obtained by assembly of the sequences of RT-PCR products obtained 

272 directly from alimentary samples. Their genome sizes were 29723 bases, with G + C 

273 content 40.7%, comparable to those of most other SARSr-CoVs (27, 28). They were 

274 closely related to each other with 99.9% overall nt identities, while they possessed 88.2% 

275 nt identities to the genomes of SARSr-Rs-BatCoV FIKU3 and 93% nt identities to the 

276 genomes of human/civet SARSr-CoVs. SARSr-Rf-BatCoV strains share similar genome 
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277 organization with other SARSr-CoV strains, containing the putative transcription 

278 regulatory sequence (TRS) motif, 5’-ACGAAC-3’, at the 3’ end of the 5’ leader sequence 

279 and preceding each ORF except ORF 7b. Similar to most other SARSr-BatCoVs, SARSr- 

280 Rf-BatCoV YNLF 31C and YNLF 34C contained a single long ORF8. 

281 The nsp3, S, ORF3 and ORF8 regions are known to be the most rapidly evolving 

282 regions among SARSr-CoV genomes (27, 28, 52, 53). Pairwise comparison of aa 

283 sequences between civet SARSr-CoV SZ3 and other SARSr-CoVs showed that the S and 

284 ORF3a of SARSr-Rf-BatCoV YNLF 31C and YNLF 34C displayed relatively low 

285 sequence identities to civet SARSr-CoV (Table 2). Flowever, the nsp3 of SARSr-Rf- 

286 BatCoV YNLF 31C and YNLF 34C exhibited 97.1% aa identity to civet SARSr-CoV, 

287 which is comparable to the high sequence identity of 96.8 to 97.5% between civet 

288 SARSr-CoV and SARSr-BatCoVs, Rs3367, RsSHC014, WIV1 and BtCoV-Cp/2011, 

289 from Yunnan reported previously (42). Furthermore, an exceptionally high sequence 

290 identity (80.4-81.3% aa identity) was observed in the ORF8 between SARSr-Rf-BatCoVs 

291 and human/civet SARSr-CoVs, much higher than that between human/civet SARSr- 

292 CoVs and other SARSr-BatCoVs (23.2-37.3% aa identity). Therefore, civet SARSr-CoV 

293 SZ3 was most closely related to SARSr-Rs-BatCoV Rs3367 and WIV1 in S and ORF3a, 

294 but was most closely related to SARSr-Rf-BatCoVs in ORF8. 

295 The predicted receptor binding domain (RBD) of SARSr-Rf-BatCoV YNLF 31C 

296 and YNLF 34C possessed 89% and 68.1% aa identities to that of SARSr-Rs-BatCoV 

297 FIKU3-1 and civet SARSr-CoV SZ3 respectively. Previous studies have identified five 

298 critical residues (residues 442, 472, 479, 487 and 491) for ACE2 binding in human and 

299 civet SARSr-CoVs (54). In particular, residues 479 and 487 are the two key residues that 
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300 are different between human and civet SARSr-CoV strains, with S—>T substitution at 

301 residue 487 resulting in 20-fold reduction in human ACE2 binding affinity (54). In 

302 SARSr-Rs-BatCoV Rs3367, two (residues 479 and 491) of the five critical residues were 

303 conserved. In SARSr-Rf-BatCoVs and most other SARSr-Rs-BatCoVs, only residue 491 

304 was conserved (Fig. 3). Compared to human/civet SARSr-CoVs and SARSr-Rs-BatCoV 

305 Rs3367, WIV1 and RsSHC014, the RBD of SARSr-Rf-BatCoV YNLF31C and 

306 YNLF34C, similar to some SARSr-BatCoV strains, contained two deletions of 5 aa and 

307 12 aa respectively. 

308 Phylogenetic analysis. Phylogenetic trees were constructed using nsp2, nsp3, 

309 nsp5, nspl2 (RdRp), S, ORF3a, ORF8 and N of SARSr-Rf-BatCoV YNLF 31C and 

310 YNLF 34C and other SARSr-CoVs (Fig. 4). These regions were selected because they 

311 were commonly used in phylogenetic analysis of CoVs (RdRp, S, N), represent regions 

312 of rapid evolution in SARSr-CoVs (nsp3, ORF3, ORF8), or free from recombination 

313 upon subsequent analysis (nsp2, nsp5). In nsp2, nsp3, nsp5, RdRp, and N genes, SARSr- 

314 Rf-BatCoV YNLF 31C and YNLF 34C were more closely related to other SARSr- 

315 BatCoVs than to two other SARSr-Rf-BatCoV strains, Rfl and BtCoV/273/2005, 

316 previously detected from greater horseshoe bats in Flubei (28, 37). Flowever, in S, ORF3 

317 and ORF8, SARSr-Rf-BatCoV YNLF 31C and YNLF 34C were most closely related to 

318 SARSr-Rf-BatCoV Rfl and BtCoV/273/2005, forming a distinct cluster among other 

319 SARSr-BatCoV s. 

320 In S and ORF3 region, human/civet SARSr-CoVs were most closely related to 

321 SARSr-Rs-BatCoV Rs3367, WIV1 and RsSFlC014 previously detected from Yunnan 

322 bats (42). This is in line with the ability of SARSr-Rs-BatCoV WIV1 to replicate in 
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323 VeroE6 cells and use ACE2 as receptor (42). In nsp3, human/civet SARSr-CoVs were 

324 most closely related to SARSr-Rf-BatCoV YNLF 31C and YNLF34C as well as 

325 SARSr-Rs-BatCoV Rs3367, WIV1 and RsSHC014. Furthermore, in ORF8, SARSr-Rf- 

326 BatCoV strains were clustered with human and civet SARSr-CoV strains with high 

327 bootstrap value of 990, whereas all SARSr-Rs-BatCoV strains, including Rs3367, WIV1 

328 and RsSHC014, formed another cluster. This concurs with results from pairwise aa 

329 sequence comparison, and suggests that the ORF8 of civet and human SARSr-CoV was 

330 originated from SARSr-Rf-BatCoVs from greater horseshoe bats instead of SARSr-Rs- 

331 BatCoV from Chinese horseshoe bats. 

332 Recombination analysis. Since the ORF8 of SARSr-Rf-BatCoVs showed high 

333 sequence identity to those of human/civet SARSr-CoVs, we hypothesize that the ancestor 

334 of civet SARSr-CoVs has acquired its ORF8 from SARSr-Rf-BatCoVs through 

335 recombination between SARSr-Rf-BatCoVs from greater horseshoe bats and SARSr-Rs- 

336 BatCoVs from Chinese horseshoe bats. When civet SARSr-CoV SZ3 was used as the 

337 query for sliding window analysis with SARSr-Rf-BatCoV YNLF 31C and SARSr-Rs- 

338 BatCoV Rs3367 and HKU3 as potential parents, several recombination breakpoints were 

339 observed. In particular, two breakpoints, between which ORF8 was located, were 

340 identified (Fig. 5). Downstream to the first breakpoint at position 27128 and upstream to 

341 the second breakpoint at position 28635, an abrupt change in clustering occurred with 

342 high bootstrap support for clustering of civet SARSr-CoV SZ3 with SARSr-Rf-BatCoV 

343 YNLF 31C. This is in line with results from phylogenetic and similarity plot analysis. 

344 Moreover, using multiple alignments, civet SARSr-CoV SZ3 was shown to possess much 

345 higher sequence similarities to SARSr-Rf-BatCoVs than to SARSr-Rs-BatCoVs within 
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346 ORF8 which includes the region corresponding to the 29-nt deletion found in human 

347 SARS-CoVs (Fig. 5). 

348 Besides ORF8, another region of interest was S which was situated between two 

349 breakpoints at position 20900 and 26100 respectively (Fig. 5). Downstream to position 

350 20900 and upstream to position 26100, an abrupt change in clustering occurred with high 

351 bootstrap support for clustering of civet SARSr-CoV SZ3 with SARSr-Rs-BatCoV 

352 Rs3367. This is in line with phylogenetic analysis and the ability of strain Rs3367 to use 

353 ACE2 as receptor for cellular entry (42). Flowever, similarity plot analysis still showed 

354 substantial difference between the S of civet SARSr-CoV SZ3 and SARSr-Rs-BatCoV 

355 Rs3367, especially in the SI region. 

356 Estimation of synonymous and non-synonymous substitution rates. Using all 

357 available SARSr-BatCoV genome sequences for analysis, the Ka/Ks ratios for various 

358 coding regions, as compared to those of civet SARSr-CoVs and human SARS-CoVs, are 

359 shown in Table 3. Notably, the Ka/Ks ratios for most coding regions of SARSr-BatCoVs, 

360 including ORF8 of SARS-Rf-BatCoVs, were low, supporting purifying selection. In 

361 contrast, many regions of civet SARSr-CoVs and human SARS-CoVs exhibited 

362 relatively high Ka/Ks ratios suggestive of positive selection. Positive selection was 

363 particularly strong at the S (Ka/Ks=3) and ORF3 (Ka/Ks=2) of civet SARSr-CoVs, and 

364 the M (Ka/Ks=2) and ORF8 (Ka/Ks=3.5) of human SARS-CoVs. 

365 Estimation of divergence dates. Using the uncorrelated relaxed clock model on 

366 ORF lab, the time of the most recent common ancestor (tMRCA) of all SARSr-CoVs was 

367 estimated to be 1960.1 [highest posterior density regions at 95% (HPD), 1899.1 to 

368 1988.6]. The tMRCA of human and civet SARSr-CoVs was estimated to be 2001.5 
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369 (HPDs, 1999.1 to 2002.5), approximately 2 years before the SARS epidemic. The 

370 tMRCA of human/civet SARSr-CoVs, SARSr-Rp-BatCoV Rp3/2004, and SARSr-Rs- 

371 BatCoV RsSHCOl4/2011, Rs3367/2012 and WIV1/2012 was estimated to be 1995.3 

372 (HPDs, 1984.5 to 2001), while that of human/civet SARSr-CoVs, and SARSr-Rf- 

373 BatCoVs, was estimated to be 1990.6 (HPDs, 1973.2 to 1999.6) (Fig. 6). 

374 Since some regions in ORFlab may be involved in recombination (Fig. 5), nsp5, 

375 which was free from recombination, was also used for analysis and showed similar tree 

376 topology. Using the uncorrelated relaxed clock model on nsp5, the time of the most 

377 recent common ancestor (tMRCA) of all SARSr-CoVs was estimated to be 1961.5 

378 [highest posterior density regions at 95% (HPD), 1898.9 to 1991.5]. The tMRCA of 

379 human and civet SARSr-CoVs was estimated to be 2000.7 (HPDs, 1996.7 to 2002.6), 

380 approximately 2 years before the SARS epidemic. The tMRCA of human/civet SARSr- 

381 Co Vs, SARSr-Rp-BatCoV Rp3/2004, and SARSr-Rs-BatCoV RsSHCOl 4/2011, 

382 Rs3367/2012 and WIV1/2012 was estimated to be 1996.3 (HPDs, 1985.2 to 2001.7), 

383 while that of human/civet SARSr-CoVs, and SARSr-Rf-BatCoVs, was estimated to be 

384 1989.9 (HPDs, 1969.6 to 2000.3) (Fig. 6) The estimated mean substitution rates of the 

385 ORFlab and nsp5 data set under the uncorrelated exponentially distributed relaxed clock 

386 model (UCED) were 2.00 xl0° and 1.36 x 10'~ substitution per site per year respectively, 

387 which are comparable to other CoVs and RNA viruses (55, 56). 

388 Expression of ORF8 and determination of leader-body junction sequence. 

389 Co Vs are characterized by a unique mechanism of discontinuous transcription with the 

390 synthesis of a nested set of subgenomic mRNAs (1, 2). To determine if ORF8 is 

391 expressed in SARSr-Rf-BatCoV and the location of the leader and body TRS used for 
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392 mRNA synthesis, the leader-body junction sites and flanking sequences of ORF8 

393 subgenomic mRNA were determined. The obtained subgenomic mRNA sequence was 

394 aligned to the leader sequence which confirmed the core sequence of the TRS motifs as 

395 5’-ACGAAC-3’ (Fig. 7), as in other SARSr-CoVs. The leader TRS and the ORF8 

396 subgenomic mRNA exactly matched each other. The SARSr-Rf-BatCoV leader was 

397 confirmed as the first 66 nt(s) of the genome. 

398 
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399 DISCUSSION 

400 The ORF8 of civet SARSr-CoV is likely to have been acquired from SARSr-Rf-BatCoVs 

401 in greater horseshoe bats (R. ferrumequinum ) through recombination. In this study, two 

402 SARSr-Rf-BatCoV strains, YNLF31C and YNLF 34C, were identified from greater 

403 horseshoe bats. Although their genomes only possessed 93% nt identities to the genomes 

404 of human/civet SARSr-CoVs, which is lower than the 95% nt identities between 

405 human/civet SARSr-CoV and SARSr-Rs-BatCoVs, Rs3367 and RsSHC014, from 

406 Chinese horseshoe bats in Yunnan, the nsp3 and ORF8 of SARSr-Rf-BatCoV 

407 YNLF 31C and YNLF 34C exhibited the highest aa identities among all SARSr- 

408 BatCoVs to that of civet SARSr-CoV SZ3. In particular, their ORF8 demonstrated much 

409 higher aa identities (81.3%) to civet SARSr-CoV SZ3 than SARSr-BatCoVs from other 

410 horseshoe bats (23.2% to 37.3%). Phylogenetic analysis of the ORF8 revealed a distinct 

411 clade formed by human/civet SARSr-CoVs and SARSr-Rf-BatCoVs separate from other 

412 SARSr-BatCoVs. This is in line with a previous report showing that the ORF8 of SARSr- 

413 Rf-BatCoV Rfl was clustered with human/civet SARSr-CoVs but not SARSr-BatCoV 

414 Rml and Rp3 upon phylogenetic analysis, although only one SARSr-Rf-BatCoV strain 

415 was available for analysis (28). Moreover, potential recombination sites were identified 

416 between SARSr-Rf-BatCoVs and SARSr-Rs-BatCoVs around the ORF8 region, leading 

417 to the generation of civet SARSr-CoV SZ3 with the ORF8 acquired from SARSr-Rf- 

418 BatCoVs. Similar to other regions of the genome, the ORF8 of SARSr-Rf-BatCoVs has 

419 been under purifying selection, which supports greater horseshoe bats as a reservoir for 

420 SARSr-Rf-BatCoVs. In contrast, the ORF8 of human SARS-CoVs was under strong 

421 positive selection, which reflects the rapid evolution soon after interspecies jumping. 
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422 These findings supported that recombination is the key mechanism involved in the 

423 acquisition of ORF8 by the ancestor of civet SARSr-CoVs. In fact, previous studies have 

424 demonstrated frequent recombination events between SARSr-Rs-BatCoV strains from 

425 different bat species of different geographical locations in China (22, 55). Moreover, a 

426 recombination breakpoint at nspl6/S intergenic region was detected between SARSr-Rp- 

427 BatCoV Rp3 from Pearson’s horseshoe bats ( Rliinolophus pearsoni) and SARSr-Rf- 

428 BatCoV Rfl during the evolution of SARSr-BatCoVs to civet SARSr-CoV (22). On the 

429 other hand, some genomic regions of SARSr-Rf-BatCoV YNLF 31C and YNLF 34C, 

430 such as nsp3, RdRp and N, were evolutionarily distinct from two previously reported 

431 SARSr-Rf-BatCoV strains, Rfl and 273/2005, upon phylogenetic analysis. This suggests 

432 that SARSr-Rf-BatCoVs from different geographical locations in China may have 

433 evolved separately through other recombination events. The present findings offer new 

434 insights into the origin and evolution of SARS-CoV, by showing that the ancestor of civet 

435 SARSr-CoV is a likely recombinant virus with ORF8 originated from SARSr-Rf- 

436 BatCoVs in greater horseshoe bats and other genome regions from different horseshoe 

437 bats. 

438 Although SARSr-Rs-BatCoV Rs3367 and RsSFlC014 represented the closest bat 

439 CoVs to SARS-CoV in terms of genome identity, they were unlikely the immediate 

440 ancestor of civet SARSr-CoVs. Previous molecular-dating studies estimated that the time 

441 of divergence between human/civet and bat SARSr-CoVs ranged from 4 to 17 years 

442 before the SARS epidemic (22, 55, 57). SARSr-CoVs were also shown to be a newly 

443 emerged subgroup of Betacoronavirus , with the median date of their MRCA estimated to 

444 be from 1961 to 1982 (55, 57). The present results are in line with such estimations, with 
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445 the tMRCA between human/civet and closest bat strains estimated to be approximately 

446 1995 (8 years before the SARS epidemic) and that among all SARSr-CoVs 

447 approximately 1960 using ORFlab. Similar results were also obtained when using nsp5 

448 region which was recombination-free. Moreover, we demonstrated that SARSs-Rf- 

449 BatCoV YNLF31C and YNLF 34C only diverged from civet/human SARSr-CoVs at 

450 approximately 1990. This is in contrast to previous studies that showed SARSr-Rp- 

451 BatCoV Rp3 as the only recently diverged strain (55, 57). Together with the evidence on 

452 the acquisition of ORF8, it is likely that civet SARSr-CoV is originated from 

453 recombination between SARS-Rs-BatCoVs and SARS-Rf-BatCoVs from different 

454 horseshoe bat species within several years before the SARS epidemic. 

455 The overlapping habitat and geographical distribution of different horseshoe bats 

456 may have fostered recombination between different SARSr-BatCoVs and emergence of 

457 SARS-CoV. Chinese horseshoe bats are widely distributed throughout China including 

458 Yunnan, Guangdong and Flong Kong. While greater horseshoe bats are also widely 

459 distributed across different provinces in China including Yunnan, they are not found in 

460 Guangdong (58). The two bat species shared similar diet and habits such as the ability to 

461 roost in man-made structures, suggesting that they may co-habitat in similar 

462 environments in Yunnan, the province with the highest biodiversity in China. In fact, 

463 SARSr-Rf-BatCoV YNLF 31C and YNLF 34C, and SARSr-Rs-BatCoV Rs3367 and 

464 RsSFlC014 were detected in Lufeng and Kunming of the Yunnan province respectively, 

465 which were only ~80 km apart and within the migration distances of horseshoe bats (Fig. 

466 1) (22, 59, 60). Since greater horseshoe bats are not found in Guangdong, recombination 

467 between SARSr-Rf-BatCoVs and SARS-Rs-BatCoVs with the generation of the ancestor 
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468 of civet SARSr-CoVs may have occurred in yet unidentified bats in Yunnan or nearby 

469 provinces, which were then transported to wildlife markets in Guangdong and infected 

470 civets. Alternatively, recombination may have occurred in civets or other animals within 

471 wildlife farms or markets where many different wild animal species are often housed 

472 together (61). A possible scenario is that the animals were co-infected with SARSr-Rf- 

473 BatCoVs and SARSr-Rs-BatCoVs from different horseshoe bats, followed by 

474 recombination events. More extensive surveillance in bats from Yunnan and neighboring 

475 provinces, as well as wildlife markets in Guangdong may reveal the immediate ancestor 

476 of civet SARSr-CoVs. 

477 The ORF8 region, unique to SARSr-CoVs, is prone to mutations or deletions 

478 during interspecies transmission. One of the most striking genomic changes observed in 

479 SARS-CoV soon after its zoonotic transmission to humans was the acquisition of a 

480 characteristic 29-nt deletion which splits ORF8 into two ORFs, ORF8a and ORF8b (25, 

481 62). While SARS-CoVs isolated from the later human cases of the epidemic contained 

482 this 29-nt deletion, isolates from civets and some early human cases possessed a single 

483 continuous ORF8 (25, 63). Besides, some early human strains and a farmed civet strain 

484 from Flubei possessed an alternative 82-nt deletion in ORF8 (63). On the other hand, four 

485 late human isolates possessed a 415-nt deletion, resulting in the loss of the entire ORF8 

486 (63). Although studies using reverse genetics showed that the ORF8 is not essential for 

487 virus replication in vitro and in vivo (64, 65), the full-length 8ab protein is a functional 

488 protein that is delivered by a cleavable signal sequence to the lumen of the endoplasmic 

489 reticulum where it becomes N-glyosylated (62). Different subcellular localizations and 

490 functions have also been reported for 8ab, 8a and 8b proteins (66-69). Inside the 
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491 endoplasmic reticulum, 8ab activates the ATF6 branch of unfolded-protein response (70). 

492 The 8a protein enhances SARS-CoV replication and induces caspase-dependent apoptosis 

493 through a mitochondria-dependent pathway (66). Moreover, antibodies against 8a protein 

494 have been detected in sera of SARS patients (66). The 8b protein down-regulates the 

495 expression of the E protein, which supported a modulatory role in viral replication (68). 

496 Moreover, overexpression of the 8b protein induces DNA synthesis (67). The 8b and 8ab 

497 proteins also play a role in the host ubiquitin-proteasome system (71). In this study, the 

498 expression of ORF8 subgenomic mRNA in SARSr-Rf-BatCoV YNLF31C suggested 

499 that this protein may also be functional in SARSr-BatCoVs. Moreover, the high Ka/Ks 

500 ratio among human SARS-CoVs compared to SARSr-BatCoVs supported that ORF8 is 

501 subject to rapid evolution under strong positive selection during animal-to-human 

502 transmission. Further studies may help understand the importance of ORF8 evolution for 

503 interspecies transmission of SARSr-CoVs. 

504 Besides SARSr-BatCoVs, diverse alphacoronaviruses and betacoronaviruses, 

505 including potentially novel CoVs, with potential interspecies transmission events were 

506 identified in this study. Bats are known important reservoirs of lineage B, C and D 

507 betacoronaviruses, while rodents are likely the reservoir of lineage A betacoronaviruses 

508 (30). Nine samples belonging to lineage D betacoronaviruses were detected in 

509 Leschenaulf s rosettes (R. leschenaulti), a known reservoir of Ro-BatCoV FIKU9 (24). 

510 Flowever, the partial RdRp sequences only possessed 75-79% nt sequences to the latter, 

511 suggesting that they may represent either novel CoV species or novel genotype of Ro- 

512 BatCoV FIKU9. As for alphacoronaviruses, 24 samples from Daubenton’s bats ( M. 

513 daubentonii ) contained viruses most closely related to My-BatCoV FIKU6 with 78-99% 
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514 nt identities in the partial RdRp region, which may represent My-BatCoV HKU6 or 

515 related viruses previously reported in the same bat species (38). Six samples contained 

516 alphacoronaviruses most closely related to Rh-BatCoV HKU2. However, four samples 

517 (YNXY 7C, YNXY10C, YNXY 45 and YNXY 50C) from Daubenton’s bats 

518 possessed partial RdRp sequences of only 80-80% nt identities to that of Rh-BatCoV 

519 HKU2, suggesting that they may represent novel CoVs. Although the other two samples 

520 (MJ 27C and MJ 69C) possessed RdRp sequences with 92-93% identities to that of Rh- 

521 BatCoV HKU2, they were detected from Daubenton’s bats and lesser brown horseshoe 

522 bats ( R. stheno) instead of Chinese horseshoe bats (R. sinicus) previously reported to 

523 carry Rh-BatCoV HKU2 (34). This may suggest interspecies transmission of Rh-BatCoV 

524 HKU2 among different bat species. Two samples from Pomona roundleaf bats 

525 ( Hipposideros Pomona) contained alphacoronaviruses most closely related to Hi- 

526 BatCoV HKU10. However, the partial RdRp sequences only possessed 81-87% nt 

527 identity to the latter. We have previously described recent interspecies transmission of 

528 BatCoV HKU10 between Leschenault’s rousettes (R. leschenaulti) and Pomona 

529 roundleaf bats, two very different bats belonging to different families, through rapid 

530 evolution of the S protein (72). Further studies are warranted to determine if the two 

531 samples from Pomona roundleaf bats contained potentially novel CoVs closely related to 

532 BatCoV HKU10 or variants of BatCoV HKU10 due to interspecies transmission. 
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808 LEGENDS TO FIGURES 

809 FIG 1 Map showing five locations of bat sampling in four autonomous prefectures (AP) in 

810 Yunnan Province, China. Sampling locations in Yunnan are in red. The location of SARSr-Rs- 

811 BatCoV strains, Rs3367 and RsSHC014, detected in a previous study (42) is in blue. 

812 FIG 2 Phylogenetic analysis of the nt sequences of the 267-nt fragment of RdRp of the 46 

813 positive samples identified in bats in Yunnan in this study. The tree was constructed by 

814 maximum likelihood method with the model GTR+G. Bootstrap values were calculated from 

815 1000 trees and only values >700 are shown and given at nodes. The scale bar indicates 5 nt 

816 substitutions per site. The two SARSr-Rf-BatCoV strains YNLF 31C and YNLF 34C are in 

817 red. The potentially novel bat CoVs are in purple. AntelopeCoV, sable antelope coronavirus 

818 (EF424621); BatCoV CDPHE15/USA/2006, Bat coronavirus CDPHE 15/US A/2006 

819 (NC_022103.1); BatCoV/SC2013, Betacoronavims/SC2013 (KJ473821.1); Erinaceus 

820 CoV/VMC/DEU/2012,Betacoronavims Erinaceus/VMC/DEU/2012(NC_022643); BCoV, bovi 

821 ne coronavirus (NC_003045); BdFIKU22, bottlenose dolphin coronavirus FIKU22 (KF793826); 

822 BuCoV HKU11, bulbul coronavirus HKU11 (FJ376619); BWCoV SW1, beluga whale 

823 coronavirus SW1 (NC_010646); CCoV, Canine coronavirus strain CCoV/NTU336/F/2008 

824 (GQ477367.1); CCRCoV, Canine respiratory coronavirus strain K37 (JX860640.1); CmCoV 

825 HKU21, common moorhen coronavirus FIKU21 (NC_016996);CoV Neoromicia/PML- 

826 PHE1/RSA/2011, coronavirus Neoromicia/PML-PHEl/RSA/2011 (KC869678); DcCoV 

827 FIKU23,dromedary camel coronavirus HKU23 (KF906251); ECoV, equine coronavirus 

828 (NC_010327); FIPV, feline infectious peritonitis vims (AY994055); GiCoV, Giraffe 

829 coronavirus US/OH3-TC/2006 (EF424622.1); HCoV-229E, human coronavirus 229E 

830 (NC_002645); HCoV-HKUl, human coronavirus HKU1 (NC_006577); HCoV-NL63, human 
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831 coronavims NL63 (NC_005831);HCoV-OC43, human coronavims OC43(NC_005147); Hi- 

832 batCoV HKU10, Hipposideros bat coronavims HKU10 (JQ989269);IBV-beaudette, beaudette 

833 coronavims (AY692454); Human MERS-CoV, middle east respiratory syndrome 

834 coronavirus(NC_019843.3); Human MERS-CoV EMC/2012, Human betacoronavims 

835 2c EMC/2012 (JX869059.2); Camel MERS-CoV KSA-CAMEL-363, middle east respiratory 

836 syndrome coronavims isolate KSA-CAMEL-363 (KJ713298); MRCoV HKU18,magpie robin 

837 coronavims HKU18(NC_016993); BatCoV 1A, Miniopterus bat coronavims 1A (NC_010437); 

838 BatCoV IB,Miniopterus bat coronavims 1B(NC_010436); Mi-batCoV HKU7, Miniopterus bat 

839 coronavims HKU7 (DQ249226); Mi-batCoV HKU8, Miniopterus bat coronavims HKU8 

840 (NC_010438); Mink CoV strain WD1127, Mink coronavims strain WD1127 (NC_023760.1); 

841 MunCoV HKU13, munia coronavims HKU13 (FJ376622);MHV-A59, murine hepatitis 

842 vims(NC_001846); My-batCoV HKU6, Myotis bat coronavims HKU6 (DQ249224); NH CoV 

843 HKU19,night heron coronavims HKU19 (NC_016994);PEDV, porcine epidemic diarrhoea 

844 vims (NC 003436); PHEV,porcine haemagglutinating encephalomyelitis vims 

845 (NC_007732);Pi-BatCoV-HKU5-l, Pipistrellus bat coronavims HKU5 (NC_009020); PorCoV 

846 HKU15, porcine coronavims HKU15 (NC_016990); PRCV, porcine respiratory coronavims 

847 (DQ811787); RbCoV HKU14, rabbit coronavims HKU14 (NC_017083); RatCoV parker, rat 

848 coronavims parker(NC_012936); Rs-batCoV HKU2, Rhinolophus bat coronavims HKU2 

849 (EF203064); Ro-batCoV-HKU9, Rousettus bat coronavims HKU9 (NC_009021); Ro-batCoV 

850 HKU10, Rousettus bat coronavims HKU10 (JQ989270);Human SARS-CoV TOR2, SARS- 

851 related human coronavims(NC_004718); Civet SARS-CoV SZ16, SARS-related palm civet 

852 coronavims (AY304488); Badger SARS-CoV, SARS-related badger coronavims 

853 CFB/SZ/94/03 (AY545919.1); SARSr-Rs-batCoV HKU3, SARS-related Rhinolophus bat 
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854 coronavims HKU3 (DQ022305); Scotophilus BatCoV 512,Scotophilus bat coronavirus 512 

855 (NC_009657); SpCoV HKU17, sparrow coronavirus HKU17 (NC_016992); TCoV, turkey 

856 coronavirus(NC_010800); TGEV, transmissible gastroenteritis vims (DQ443743); ThCoV 

857 HKU12, thrush coronavims HKU12 (FJ376621);Ty-BatCoV-HKU4-l, Tylonycteris bat 

858 coronavims HKU4 (NC_009019);WECoV HKU16, white-eye coronavims HKU16 

859 (NC_016991);WiCoV HKU20, wigeon coronavims HKU20 (NC_016995). 

860 FIG 3 Multiple alignment of the amino acid sequences of the receptor-binding motifs of the 

861 spike proteins of human and civet SARSr-CoV and the corresponding sequences of SARSr- 

862 BatCoVs in different Rhinolophus species. Asterisks indicate positions that have fully 

863 conserved residues. Amino acid deletions among some SARSr-BatCoVs are highlighted yellow. 

864 The five critical residues for receptor binding in human SARS-CoV, at positions 

865 442,472,479,487,491, are highlighted pink. 

866 FIG 4 Phylogenetic analyses of nsp2, nsp3, nsp5, RdRp, S, ORF3, ORF8 and N nucleotide 

867 sequences of SARSr-BatCoVs from different bat species. The trees were constructed by the 

868 maximum likelihood method using (A) GTR+G; (B) GTR+G; (C) GTR+G+I; (D) TN93+G; (E) 

869 GTR+G; (F) TN93+G (G) T92 +G (H) GTR+G substitution models respectively and bootstrap 

870 values calculated from 1000 trees. Except for ORF3 and ORF8, all trees were rooted using 

871 corresponding sequences of FICoV HKU1 (GenBank accession number NC 006577). Only 

872 bootstrap values >70% are shown. (A) 1736 nt (B) 5019 nt (C) 908 nt (D) 2777 nt (E) 3638 nt 

873 (F) 804 nt (G) 345 nt (H) 1222 nt positions respectively were included in the analyses. The 

874 scale bars represent (A) 50 (B) 10 (C) 20 (D) 20 (E) 10 (F) 20 (zG) 10 (H) 200 substitutions per 

875 site respectively. Human and civet SARSr-CoVs are in green, SARSr-Rs-BatCoVs from R. 
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876 sinicus are in blue and SARSr-Rs-BatCoVs from R. ferrumequinum are in red. The two SARSr- 

877 Rf-BatCoV strains YNLF31C and YNLF 34C detected in this study are bolded. 

878 FIG 5 (A) Bootscan (upper panel) and Simplot (lower panel) analysis using the genome 

879 sequence of civet SARSr-CoV strain SZ03 as the query sequence. Bootscanning was conducted 

880 with Simplot version 3.5.1 (F84 model; window size, 1000 bp; step, 200 bp) on a gapless nt 

881 alignment, generated with ClustalX. The red line denotes SARSr-Rf-BatCoV strain YNLF 31C, 

882 the blue line denotes SARSr-Rs-BatCoV strain Rs3367 and the black line denotes SARSr-Rs- 

883 BatCoV strain FIKU3-1. The ORF8 region with potential recombination is highlighted yellow. 

884 (B) Multiple alignment of nt sequences from genome position 27000 to 28700. Bases conserved 

885 between civet SARSr-CoV SZ03 and SARSr-Rf-BatCoVs (strains YNLF 31C and Rfl) are 

886 marked in yellow boxes. Bases conserved between civet SARSr-CoV SZ03 and SARSr-Rs- 

887 BatCoVs (strains Rs3367 and FIKU3-1) are marked in green boxes. The 29-nt deletion in 

888 human SARS coronavirus TOR2 is highlighted orange The start codon and stop codon of ORF8 

889 are labelled with black boxes. 

890 FIG 6 Estimation of tMRCA of SARSr-CoVs based on ORFlab (A) and nsp5 (B). The mean 

891 estimated dates were labeled. The taxa were labeled with their sampling dates. 

892 FIG 7 SARSr-Rf-BatCoV YNLF31C mRNA leader-body junction and flanking sequences. The 

893 subgenomic ORF8 mRNA sequences are shown in alignment with the leader and the genomic 

894 sequence. The start codon AUG in subgenomic RNA is depicted in red. The putative TRS is 

895 depicted in boldface type and underlined. Identical bases between leader sequence and 

896 subgenomic mRNA sequence are in blue. Identical bases between genome and subgenomic 

897 mRNA sequences are in green. 
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898 Table 1. Detection of CoVs in different bat species by RT-PCR of the 440-bp fragment of RdRp gene 


Scientific name 

Common name 

No. of bats 
tested 

No. of bats 
positive for CoV 

CoV detected/closest 
match in GenBank 

Nt 

identity to closest 
match (%) 

Sampling 
location of 
positive bats 

Rhinolophus luctus 

Woolly horseshoe 
bat 

32 

0 

■ 

■ 

“ 

Rhinolophus affinis 

Intermediate 
horseshoe bat 

22 

0 

- 

- 

- 

Rhinolophus ferrumequinum 

Greater horseshoe 
bat 

11 

2 

SARS-CoV (2) 

100 

Lufeng 

Rhinolophus stheno 

Lesser brown 
horseshoe bat 

34 

1 

Rs-BatCoV HKU2 (1) 

92 

Mojiang 

Hipposideros pomona 

Pomona roundleaf 
bat 

17 

2 

Hi-BatCoV HKU10 (2) 

81-87 

Mojiang 

Myotis daubentonii 

Daubenton’s bat 

98 

32 

My-BatCoV HKU6 (24) 

78-99 

Xiangyun 





Rs-BatCoV HKU2 (1) 

93 

Mojiang 





Rs-BatCoV HKU2 (4) 

80-81 

Xiangyun 





Mi-BatCoV HKU7 (2) 

96 

Mojiang 





Mi-BatCoV HKU8 (1) 

96 

Mojiang 

Rousettus leschenaulti 

Leschenault’s 

rousette 

115 

9 

Ro-BatCoV HKU9 (9) 

75-79 

Mengla 

Unknown bat species 


19 

0 

. 

. 

. 
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900 Table 2. Percentage amino acid identities of the selected predicted gene products of SARSr-CoVs to civet SARSr-CoV strain SZ3 

901 



nsp2 

nsp3 

nsp5 

nspl2 

S 

ORF3 

E 

M 

ORF8 

a N 

Civet SARSr-CoV civet007 

99.5 

99.5 

100.0 

99.7 

98.6 

98.1 

100.0 

100.0 

98.3 

@02 

Civet SARSr-CoV SZ16 

100.0 

99.9 

100.0 

99.9 

99.9 

100.0 

100.0 

100.0 

98.3 

100.0 

Human SARS-CoV BJ01 

99.8 

99.6 

100.0 

99.9 

98.8 

98.1 

100.0 

99.5 

38.2 

?(8J 

Human SARS-CoV GZ02 

99.8 

99.8 

100.0 

99.9 

99.0 

97.8 

100.0 

99.5 

98.3 

m 

Human SARS-CoV Tor2 

99.8 

99.6 

100.0 

99.9 

98.6 

98.1 

100.0 

99.5 

37.3 

100.0 

SARSr-Rs-BatCoV Rs3367 

97.8 

96.8 

100.0 

99.6 

92.3 

96.7 

99.1 

97.7 

32.2 

@09 

SARSr-Rs-BatCoV RsSHC014 

98.3 

96.8 

99.7 

99.6 

90.1 

96.7 

99.1 

97.7 

33.0 

99.5 

SARSr-Rs-BatCoV WIV1 

97.8 

96.8 

99.7 

99.5 

92.3 

96.3 

99.6 

97.7 

32.2 


SARSr-Rs-BatCoV HKU3-1 

90.6 

91.7 

99.3 

98.6 

77.9 

81.3 

97.4 

98.2 

31.4 

m 

SARSr-Rs-BatCoV HK.U3-2 

90.6 

91.7 

99.3 

98.6 

77.8 

81.3 

96.5 

98.2 

31.4 

96.7 

SARSr-Rs-BatCoV HKU3-3 

90.6 

91.7 

99.3 

98.6 

77.9 

81.3 

96.1 

98.2 

31.4 

@08 

SARSr-Rs-BatCoV HKU3-6 

90.6 

91.7 

99.3 

98.5 

78.0 

81.3 

97.4 

98.2 

31.4 

96.4 

SARSr-Rs-BatCoV HKU3-8 

90.0 

91.7 

99.0 

98.8 

78.1 

81.7 

97.4 

96.4 

23.2 

§Pi 9 

SARSr-Rs-BatCoV HKU3-12 

90.4 

91.7 

99.3 

98.9 

78.1 

81.7 

97.4 

98.2 

31.4 

9ft 

SARSr-Rs-BatCoV HKU3-13 

90.6 

91.2 

99.3 

98.6 

78.0 

81.0 

97.4 

98.2 

31.4 

96.4 

SARSr-Rs-BatCoV Rs672/2006 

98.3 

87.1 

99.3 

99.7 

78.0 

89.4 

98.7 

98.2 

32.2 

m 

SARSr-Rb-BatCoV BM48-31/BGR 

70.8 

75.9 

94.4 

97.7 

74.8 

69.4 

96.5 

89.4 


87.2 

SARSr-Rm-BatCoV 279/2005 

89.6 

90.3 

99.7 

99.1 

78.6 

83.2 

97.4 

96.8 

31.7 


SARSr-Rm-BatCoV Rml 

89.5 

90.0 

99.3 

92.4 

78.7 

83.2 

97.8 

96.8 

33.0 

9ft 

SARSr-Rp-BatCoV Rp3 

96.7 

95.1 

99.7 

92.8 

78.4 

83.2 

99.6 

96.8 

33.0 

97.9 

SARSr-Rp-BatCoV Rp/Shaanxi2011 

93.6 

93.0 

100.0 

92.3 

79.0 

82.1 

90.0 

96.4 

33.0 

m 

SARSr-Cp-BatCoV Cp/Yunnan2011 

90.8 

97.5 

100.0 

92.2 

78.9 

89.4 

97.0 

98.6 

31.4 

98.1 

SARSr-Rf-BatCoV Rfl 

90.1 

92.0 

99.7 

91.6 

76.5 

85.7 

96.1 

97.3 

80.4 


SARSr-Rf-BatCoV 273/2005 

89.8 

92.3 

99.7 

98.4 

76.6 

85.7 

98.7 

97.3 

80.4 

9ft 

SARSr-Rf-BatCoV YNLF 31C 

95.0 

97.1 

99.7 

99.5 

77.3 

86.8 

97.4 

98.2 

81.3 

98.1 

SARSr-Rf-BatCoV YNLF 34C 

95.0 

97.1 

99.7 

99.0 

77.3 

86.8 

99.1 

98.2 

81.3 

Wl 


918 a The high amino acid identities in nsp3 and ORF8 between SARSr-Rf-BatCoVs and civet SARSr-CoV are in bold. 
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919 Table 3. Non-synonymous and synonymous substitution rates in the coding regions of SARSr-CoVs among different hosts 


SARSr-Rf-BatCoV SARSr-Rs-BatCoV Civet SARSr-CoV Human SARS-CoV 


(n=4) _ _ (n=17) _ _ (n=18) _ _ (n=122) 


Gene 

Ka 

Ks 

Ka/Ks 

gene 

Ka 

Ks 

Ka/Ks 

gene 

Ka 

Ks 

Ka/Ks a 

gene 

Ka 

Ks 

Ka/Ks 

nspl 

0.013 

0.081 

0.161 

nspl 

0.003 

0.108 

0.028 

nspl 

0.000 

0.000 

- 

nspl 

0.000 

0.000 

- 

nsp2 

0.036 

0.349 

0.103 

nsp2 

0.023 

0.230 

0.100 

nsp2 

0.001 

0.003 

0.333 

nsp2 

0.000 

0.001 

0.000 

nsp3 

0.030 

0.414 

0.073 

nsp3 

0.018 

0.288 

0.063 

nsp3 

0.001 

0.002 

0.500 

nsp3 

0.004 

0.005 

0.800 

nsp4 

0.012 

0.391 

0.031 

nsp4 

0.010 

0.222 

0.045 

nsp4 

0.001 

0.002 

0.500 

nsp4 

0.002 

0.002 

1.000 

nsp5 

0.003 

0.442 

0.007 

nsp5 

0.004 

0.244 

0.016 

nsp5 

0.001 

0.000 

- 

nsp5 

0.000 

0.001 

0.000 

nsp6 

0.009 

0.331 

0.027 

nsp6 

0.005 

0.178 

0.028 

nsp6 

0.000 

0.002 

0.000 

nsp6 

0.002 

0.001 

2.000 

nsp7 

0.018 

0.549 

0.033 

nsp7 

0.000 

0.181 

0.000 

nsp7 

0.002 

0.000 

- 

nsp7 

0.000 

0.001 

0.000 

nsp8 

0.004 

0.249 

0.016 

nsp8 

0.003 

0.175 

0.017 

nsp8 

0.001 

0.000 

- 

nsp8 

0.000 

0.000 

- 

nsp9 

0.000 

0.199 

0.000 

nsp9 

0.003 

0.199 

0.015 

nsp9 

0.001 

0.000 

- 

nsp9 

0.001 

0.000 

- 

nsplO 

0.011 

0.355 

0.031 

nsplO 

0.000 

0.158 

0.000 

nsplO 

0.000 

0.000 

- 

nsplO 

0.002 

0.002 

1.000 

nspl 2 

0.038 

0.109 

0.349 

nspl 2 

0.026 

0.076 

0.342 

nspl 2 

0.000 

0.003 

0 

nspl 2 

0.001 

0.001 

1.000 

nspl 3 

0.002 

0.347 

0.006 

nspl 3 

0.002 

0.199 

0.010 

nspl 3 

0.000 

0.003 

0 

nspl 3 

0.001 

0.001 

1.000 

nspl 4 

0.006 

0.485 

0.012 

nspl 4 

0.005 

0.270 

0.019 

nspl 4 

0.001 

0.003 

0.333 

nspl 4 

0.001 

0.001 

1.000 

nspl 5 

0.016 

0.452 

0.035 

nspl 5 

0.012 

0.275 

0.044 

nspl 5 

0.000 

0.000 

- 

nspl 5 

0.000 

0.001 

0 

nspl 6 

0.008 

0.306 

0.026 

nspl 6 

0.005 

0.277 

0.018 

nspl 6 

0.002 

0.002 

1.000 

nspl 6 

0.002 

0.003 

0.667 

S 

0.012 

0.174 

0.070 

S 

0.049 

0.412 

0.119 

S 

0.003 

0.001 

3.000 

S 

0.001 

0.002 

0.500 

ORF3 

0.012 

0.065 

0.185 

ORF3 

0.041 

0.220 

0.186 

ORF3 

0.002 

0.001 

2.000 

ORF3 

0.072 

0.386 

0.187 

E 

0.015 

0.070 

0.214 

E 

0.003 

0.037 

0.081 

E 

0.000 

0.000 

- 

E 

0.001 

0.002 

0.500 

M 

0.003 

0.096 

0.313 

M 

0.007 

0.097 

0.072 

M 

0.001 

0.002 

0.500 

M 

0.002 

0.001 

2.000 

ORF8 

0.021 

0.110 

0.190 

ORF8 

0.035 

0.197 

0.178 

ORF8 b 

0.004 

0.000 

- 

ORF8 b 

0.007 

0.002 

3.500 

N 

0.015 

0.143 

0.105 

N 

0.008 

0.069 

0.116 

N 

0.002 

0.005 

0.400 

N 

0.000 

0.001 

0 


920 a Ka/Ks ratios of >0.5 are in bold. 

921 b Only ORF8 sequences without deletions were included in analysis. 
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Alphacoronavirus 


Be tacorona virus Lineage A 


Bctacoronavirus Lineage D 


Bctacoronavirus Lineage 8 


Bctacoronavirus Lineage C 


Deltacoronavirus 
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Amino acid residue position 


442 


472 479 


487 491 


Human SARS-CoV BJ01 
Human SARS-CoV TOR2 
Human SARS-CoV GZ02 
Civet SARS-CoV SZ3 
Civet SARS-CoV civet007 
SARSr-Rs-BatCoV Rs3367 
SARSr-Rs-BatCoV RsSHC014 
SARSr-Rs-BatCoV WIV1 
SARSr-Ra-BatCoV LYRall 
SARSr-Rm-BatCoV Rml 
SARSr-Rm-BatCoV 279/2005 
SARSr-Rp-BatCoV Rp3 
SARSr-Cp-BatCoV Cp/Yunnan2011 
SARSr-Rp-BatCoV Rp/Shaanxi2011 
SARSr-Rs-BatCoV Rs672/2006 
SARSr-Rb-BatCoV BM48-31/BGR/2008 
SARSr-Rs-BatCoV HKU3-1 
SARSr-Rs-BatCoV HKU3-7 
SARSr-Rf-BatCoV Rfl 
SARSr-Rf-BatCoV 273/2005 
SARSr-Rf-BatCoV YNLF 31C 
SARSr-Rf-BatCoV YNLF 34C 
Clustal Consensus 


NTRNIEATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLNDYGFYTTTGIGYQPYR] 
NTRNIEATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLNDYGFYTTTGIGYQPYR 

ntrnieatstgnynykyrylrhgklrpferdisnvpfspdgkpctppalncywplndygfytttgigyqpyrJ 


NTRNICATQTGNYNYKYRSLRHGKLRPFERDISNVPFSPDGKPCTPPAFNCYWPLNDYGFYITNGIGYQPYR1 

ntnskesstsgnynylyrwvrrsklnpyerdlsndiyspggqscsavgpncynplrpygffttagvghqpyr J 

NTRNIEATQTGNYNYKYRS-LRHGKLRPFERDISNVPFSPDGKPCTPPAFNCYWPLNDYGFYITNGIGYQPYR J 
NTRNIEATSSGNFNYKYRSLRHGKLRPFERDISNVPFS PDGKPCT PPAFNCYWPLNDYGFYTTNGIGYQPYR 

NTAQQE-QG—QYYYRSYRKEKLKPFERDLS---- SDE—NGVYTLSTYDFYPSIPVEYQATR 1 

NTAQQE-QG—QYYYRSYRKEKLKPFERDLS —SDE-NGVYTLSTYDFYPSIPVEYQATR j 

NTAKQE-QG—QYYYRSHRKTKLKPFERDLS-SDE-NGVRTLSTYDFYPSVPVAYQATR 

NTANQE- RG— QYYYRSSRKTKLKPFERDLS -SDE—NGVRTLSTYDFYPSVPLEYQATR 

NTANQE-QG—QYYYRSSRKEKLKPFERDLS - SDE-NGVYTLSTYDFYPSVPLDYQATR 

NTAKQE- QG— QYYYRSSRKTKLKPFERDLT--i^SDE-NGVRTLSTYDFYPNVPIEYQATR 

NTNSLE - SSNEFFYRRFRHGKIKPYGRDLSNVLFNPSGGTCSAEGLNCYKPLASYGFTQSSGIGFQPYR 

NTAKHE-TG-:-NYYYRSHRKTKLKPFERDLS—^S^S^SSSSDDGNGVYTLSTYDFNPNVPVAYQATR 1 

NTAKQE-TGf|NYYYRSHRKTKLKPFERDLS lSDD<3NGVYTLSTYDFNPNVPVAYQATR j 

NTAKQE-VG—SYFYRSHRSSKLKPFERDLS--|SEE;-NGVRTLSTYDFNQNVPLEYQATR 

NTAKQE-VG—SYFYRSHRSSKLKPFERDLS-^^^^^^^KsVE-ENGRTLSTYDFNQNVPLEYQATR 

NTAKYE-VG—SYFYRSHRSSKLKPFERDLS--— SEE-NGARTLSTYDFNQNVPLEYQATR 

NTAKYE-VG—SYFYRSHRSSKLKPFERDLS--- ^^BB1B- JSEE-NGARTLSTYDFN0NVPLEYQATR 

** * _ * * j ★ * -k **_* J * * * * 

5 a.a deletion 12 a.a deletion 


Human 

Civet 

R.sinicus 

R.affinis 

R.macrotis 

R.pearsoni 
Chaereohon plicata 
R.pusillus 
R.sinicus 
R.blasii 

R.sinicus 

R.ferrumequinum 
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A 

nsp2 


l-j SARSr-Rs-BatCoV HKU3 strains 1-12/R.sinicus 

, SARSr-Rm-BatCoV 279/2005 /R.macrotis 
H SARSr-Rm-BatCoV Rml/R.macrotis 


B 

nsp3 


' 22-| SARSr-Rs-BatCoV HKU3 strains 1-12/R.sinicus 

r-SARSr-Rp-BatCoV Rp/Shaanxi2011 /R.pusillus 
ji SARSr-Rf-BatCoV 273/2005 /R.ferrumequinum 
Jo* SARSr-Rf-BatCoV Rfl/R.ferrumequinum 

_I SARSr-Rm-BatCoV Rml/R.macrotis 

J 00 I SARSr-Rm-BatCoV 279/2005/R.macrotis 
r SARSr-Rp-BatCoV Rp3/R.pusillus 
0 >i SARSr-Rf-BatCoV YNLF 31C/R.ferrumequinum 
p SARSr-Rf-BatCoV YNLF 34C/R.ferrumequinum 
P , SARSr-Rs-BatCoV WIV1 
fill SARSr-Rs-BatCoV Rs3367/R.sinicus 
S AR Sr- Rs- Ba tCoV RsS HC 014/R. si nic us 
■ SARSr-Rs-BatCoV Rs672/2006 /R.sinicus 
- SARSr-Cp-BatCoV Cp/Yunnan2011/Chaerephon plicata 
Civet SARS-CoV civet007 
Human SARS-CoV TOR2 
Human SARS-CoV BJ01 
Human SARS-CoV GZ02 
Civet SARS-CoV SZ3 
Civet SARS-CoV SZ16 

-SARSr-Rb-BatCoV BM48-31/BGR /R.blasii 


S3 


8! 


- SARSr-Rp-BatCoV Rp/Shaanxi2011 /R.pusillus 
. SARSr-Rf-BatCoV Rfl/R.ferrumequinum 

oeP SARSr-Rf-BatCoV 273/2005 /R.ferrumequinum 
SARSr-Rf-BatCoV YNLF 31C/R.fermmequinum 
P SARSr-Rf-BatCoV YNLF 34C/R.fermmequinum 
SARSr-Rs-BatCoV Rs672/2006 /R.sinicus 
SARSr-Rs-BatCoV RsSHCOI 4/R.sinicus 

- SARSr-Rp-BatCoV Rp3/R.pearsoni 
W | SARSr-Rs-BatCoV WIV1 

SARSr-Rs-BatCoV Rs3367/R.sinicus 
Human SARS-CoV TOR2 
Human SARS-CoV BJ01 
Civet SARS-CoV civet007 
ig Human SARS-CoV GZ02 
Civet SARS-CoV SZ16 
Civet SARS-CoV SZ3 

-SARSr-Rb-BatCoV BM48-31/BGR /R.blasii 


% 


9S 


c 

nsp5 


L- SARS-Cp-BatCoV Cp/Yunnan2011 /Chaerephon plicata 
H 

0 02 

91 p-| SARSr-Rs-BatCoV HKU3 strains 1-12/R.sinicus 

"I, SARSr-Rm-BatCoV Rml/R.macrotis 
jo*SARS r-Rm-BatCoV 279/2005/R.macrotis 

G SARSr-Rp-BatCoV Rp/Shaanxi2011 /R.pusillus 
SARSr-Rf-BatCoV 273/2005 /R.ferrumequinum 
SARSr-Rf-BatCoV Rfl/R.ferrumequinum 
— SARSr-Cp-BatCoV Cp/Yunnan2011/Chaerephon plicata 
oo. SARSr-Rf-BatCoV YNLF 31C/R.ferrtjmequinum 
1 SARSr-Rf-BatCoV YNLF 34C/R.ferramequinum 
- SARSr-Rp-BatCoV Rp3/R.pearsoni 
jDiU SARSr-Rs-BatCoV Rs3367/R.sinicus 
r H SARSr-Rs-BatCoV RsSHC014/R.sinicus 
92 SARSr-Rs-BatCoV WIV1 

- SARSr-Rs-BatCoV Rs672/2006 /R.sinicus 
Human SARS-CoV GZ02 
Human SARS-CoV BJ01 
Civet SARS-CoV SZ16 
99 Human SARS-CoV TOR2 
Civet SARS-CoV civet007 
Civet SARS-CoV SZ3 

-SARSr-Rb-BatCoV BM48-31/BGR /R.blasii 


D 

nspl2 


y 


-^-^SARSr-Rs-BatCoV HKU3 strains 1-12/R.sinicus 

loo r SARSr-Rf-BatCoV Rfl/R.ferrumequinum 
pJ * 1 SARSr-Rf-BatCoV 273/2005 /R.ferrumequinum 
' —SARSr-Rp-BatCoV Rp/Shaanxi2011 /R.pusillus 
— SARSr-Cp-BatCoV Cp/Yunnan2011 /Chaerephon plicata 
90, SARSr-Rs-BatCoV WIV1 

SARSr-Rs-BatCoV Rs3367/R.sinicus 
SARSr-Rs-BatCoV RsSHC014/R.sinicus 
SARSr-Rs-BatCoV Rs672/2006 /R.sinicus 
SARSr-Rp-BatCoV Rp3/R.pearsoni 
, SARSr-Rf-BatCoV YNLF 31C/R.ferrumequinum 
> SARSr-Rf-BatCoV YNLF 34C/R.ferrumequinum 
Civet SARS-CoV civet007 
Civet SARS-CoV SZ16 
Civet SARS-CoV SZ3 
Human SARS-CoV TOR2 
Human SARS-CoV GZ02 
Human SARS-CoV BJ01 
SARSr-Rm-BatCoV Rml/R.macrotis 


J ' O 

S 

S 

; 


99 1 SARSr-Rm-BatCoV 279/2005 /R.macrotis 
—-SARSr-Rb-BatCoV BM48-31/BGR /R.blasii 


0.05 


E 87 -4 SARSr-Rs-BatCoV HKU3 strains 1-12/R.sinicus 

S SARSr-Rp-BatCoV Rp3/R.pearsoni 

SARSr-Rm-BatCoV Rml/R.macrotis 
SARSr-Rm-BatCoV 279/2005/R.macrotis 
100 , SARSr-Rf-BatCoV YNLF 31C/R.Ferrumequinum 
SARSr-Rf-BatCoV YNLF 34C/R.ferrumequinum 
SARSr-Rf-BatCoV Rfl/R.ferrumequinum 
ioo ‘ SARSr-Rf-BatCoV 273/2005/R.ferrumequinum 
SARSr-Cp-BatCoV Cp/Yunnan2011/Chaerephon plicata 
SARSr-Rs-BatCoV Rs672/2006 /R.sinicus 

-SARSr-Rp-BatCoV Rp/Shaanxi2011 /R.pusillus 

ioo SARSr-Rs-BatCoV Rs3367/R.sinicus 
'?2J SARSr-Rs-BatCoV WIV1 

L SARSr-Rs-BatCoV RsSHC014/R.sinicus 
Civet SARS-CoV SZ3 
Civet SARS-CoV SZ16 
Civet SARS-CoV civet007 
Human SARS-CoV GZ02 
Human SARS-CoV BJ01 
Human SARS-CoV TOR2 

-SARSr-Rb-BatCoV BM48-31/BGR/R.blasii 



ORF3 


ioo 


100 


97 


^ SARSr-Rs-BatCoV HKU3 strains 1-13/ R.sinicus 

hr 

3 1 _| s> 

t nnl S> 


oo ' J C 

88 i 3on $ 


SARSr-Rp-BatCoV Rp/Shaanxi2011 /R.pusillus 
SARSr-Rp-BatCoV Rp3/R.pearsoni 
SARSr-Rm-BatCoV Rml/R.macrotis 
ioo' SARSr-Rm-BatCoV 279/2005 /R.macrotis 

- SARSr-Cp-BatCoV Cp/Yunnan2011 /Chaerephon plicata 

- SARSr-Rs-BatCoV Rs672/2006 /R.sinicus 
oo, SARSr-Rf-BatCoV Rfl/R.ferrumequinum 

SARSr-Rf BatCoV 273/2005 /R.ferrumequinum 
SARSr-Rf-BatCoV YNLF 31C/R.ferrumequinum 
00 ■ SARSr-Rf-BatCoV YNLF 34C/R.ferrumequinum 
SARSr-Rs-BatCoV Rs3367/R.sinicus 
SARSr-Rs-BatCoV WIV1 
SARSr-Rs-BatCoV RsSHC014/R.sinicus 
Human SARS-CoV GZ02 
Human SARS-CoV BJ01 
Human SARS-CoV Tor2 
• Civet SARS-CoV civet007 
Civet SARS-CoV SZ16 
Civet SARS-CoV SZ3 
SARSr-Rb-BatCoV BM48-31/BGR /R.blasii 


93 


98 

95 

82 

90 

74 


G 

ORF8 


99 


0.1 

Civet SARS-CoV civet007 
Civet SARS-CoV SZ16 
Civet SARS-CoV SZ3 
Human SARS-CoV GZ02 

99 . SARSr-Rf-BatCoV YNLF 31C/R.ferrumequinum 
I SARSr-Rf-BatCoV YNLF 34C/R.ferrumequinum 

" 99 " | SARSr-Rf-BatCoV Rfl/R.ferrumequinum 
96^ SARSr-Rf-BatCoV 273/2005 /R.ferrumequinum 
— SARSr-Cp-BatCoV Cp/Yunnan2011/Chaerephon plicata 
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Human SARS-CoV TOR2 
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SARSr-Rp-BatCoV Rp/Shaanxi2011/R.pusillus 
SARSr-Rm-BatCoV Rml/R.macrotis 
SARSr-Rm-BatCoV 279/2005/R.macrotis 
-SARSr-Rp-BatCoV Rp3/R.pearsoni 
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27000 27010 27020 27030 27040 27050 

-Rf-BatCoV YNLF 31C.C. . .C-C. T. 

-Rf-BatCoV Rfl 
SARS-COV SZ3 
SARS-COV CZ02 
SARS-CoV TOR2 
Rs-BatCoV Rs3367 
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Rf-BatCoV YNLF 31C 
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SARS-CoV GZ02 
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■Rf-BatCoV YNLF 31C. c -* 

-Rf-BatCoV Rfl 
SARS-COV SZ3 
SARS-CoV GZ02 
SARS-COV TOR2 
■Rs-BatCoV Rs3367 
-Rs-BatCoV HKU3-1 


27520 


• I ' 


27530 


• l • 


• 11 • • 
27540 


27600' 27610 
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28300 28310 28320 28330 28340 28350 28360 28370 28380 28390 
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SARS-CoV SZ3 GGGCAAGGCCAAAACAGCGCCGACCCCAAGGTTTACCCAATAATACTGCGTCTTGGTTCACAGCTCTCACTCAGCATGGCAAGGAGGAACTTAGATTCCC 

SARS-CoV GZ02 . 

SARS-CoV TOR2 .;. 

Rs-BatCoV Rs3367 •••••;••.*...;. 
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Rf-BatCoV YNLF 31C--A. T. 
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SARS-COV SZ3 
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Rs-BatCoV HKU3-1 


28420 28430 


28440 28450 28460 


28470 


28480 


28490 


ilili 


.AA. 


28500 28510 28520 28530 ' 28540 ' 28550 ' 28560 ' 28570 ' 28580 ' 28590 
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Civet SARS-CoV PC4-227/2004 
r Civet SARS-CoV GZ0402/2004 
[ Civet SARS-CoV GZ0401/2003 
L Civet SARS-COV PC4-136/2004 
Civet SARS-CoV PC4-13/2004 
L Civet SARS-CoV civet020/2004 
Human SARS-CoV CUHKW1/2003 
Human SARS-CoV CUHKSu 10/2003 
Human SARS-CoV TW1/2003 
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Human SARS-CoV CUHKAG02/2003 
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L SARSr-Rf-BatCoV YNIF_34C/2013 

- SARSr-Rm-BatCoV Rm 1/2004 

- SARSr-Rf-BatCoV 273/2005 
-SARSr-Rf-BatCoV Rfl/2004 

- SARSr-Rs-BatCoV HKU3-6/2005 
r SARSr-Rs-BatCoV HKU3-11/2007 
[1| SARSr-Rs-BatCoV HKU3-9/2006 
SARSr-Rs-BatCoV HKU3-10/2006 
1 — SARSr-Rs-BatCoV HKU3-13/2007 
1 SARSr-Rs-BatCoV HKU3-5/2005 
SARSr-Rs-BatCoV HKU3-4/2005 
SARSr-Rs-BatCoV HKU3-2/2005 
i SARSr-Rs-BatCoV HKU3-1/2005 
1 SARSr-Rs-BatCoV HKU3-3/2005 
-SARSr-Rs-BatCoV HKU3-12/2007 


r SARSr-Rs-BatCoV HKU3-7/2006 
L SARSr-Rs-BatCoV HKU3-8/2006 


1960 


1970 


1980 


1990 


2000 


2010 


B 

nsp5 


-2001 


-1996 


-1990 


-1961 


_r Human SARS-CoV CUHKAG01/2003 
L Human SARS-CoV CUHKSu 10/2003 
r Human SARS-CoV GZ02/2003 
- Human SARS-CoV CUHKW 1/2003 
Civet SARS-CoV GZ0401/2003 
Civet SARS-CoV PC4-227/2004 
Civet SARS-COV PC4-136/2004 
Civet SARS-CoV civetOO7/2004 
ir Civet SARS-CoV GZ0402/2004 
T- Civet SARS-CoV civet020/2004 
h- Civet SARS-CoV PC4-13/2004 
*- Civet SARS-CoV civetO 10/2004 
|rCivet SARS-CoV SZ16/2003 
Human SARS-CoV TW1/2003 
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